Numerical Atrribute Extraction from Clinical Texts

نویسندگان

  • Sarath P. R.
  • Sunil Mandhan
  • Yoshiki Niwa
چکیده

This paper describes about information extraction system, which is an extension of the system developed by team Hitachi for "Disease/Disorder Template filling” task organized by ShARe/CLEF eHealth Evolution Lab 2014. In this extension module we focus on extraction of numerical attributes and values from discharge summary records and associating correct relation between attributes and values. We solve the problem in two steps. First step is extraction of numerical attributes and values, which is developed as a Named Entity Recognition (NER) model using Stanford NLP libraries. Second step is correctly associating the attributes to values, which is developed as a relation extraction module in Apache cTAKES framework. We integrated Stanford NER model as cTAKES pipeline component and used in relation extraction module. Conditional Random Field (CRF) algorithm is used for NER and Support Vector Machines (SVM) for relation extraction. For attribute value relation extraction, we observe 95 % accuracy using NER alone and combined accuracy of 87 % with NER and SVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts

BACKGROUND Temporal expression extraction and normalization is a fundamental and essential step in clinical text processing and analyzing. Though a variety of commonly used NLP tools are available for medical temporal information extraction, few work is satisfactory for multi-lingual heterogeneous clinical texts. METHODS A novel method called TEER is proposed for both multi-lingual temporal e...

متن کامل

Using String Vector based KNN for Keyword Extraction

In this research, we propose the string vector based KNN as the approach to the keyword extraction. The keyword extraction may be viewed as an instance of word classification, encoding words into numerical vectors may cause the main problems, such as the huge dimensionality, the sparse distribution and the poor transparency, and the problems were solved by encoding texts into string vectors in ...

متن کامل

Text Mining and Big Data Analytics for Retrospective Analysis of Clinical Texts from Outpatient Care

This paper presents the results of an on-going research project for knowledge extraction from large corpora of clinical narratives in Bulgarian language, approximately 100 million of outpatient care notes. Entities with numerical values are mined in the free text and the extracted information is stored in a structured format. The Algorithms for retrospective analyses and big data analytics are ...

متن کامل

Relation extraction from clinical texts using domain invariant convolutional neural network

In recent years extracting relevant information from biomedical and clinical texts such as research articles, discharge summaries, or electronic health records have been a subject of many research efforts and shared challenges. Relation extraction is the process of detecting and classifying the semantic relation among entities in a given piece of texts. Existing models for this task in biomedic...

متن کامل

Stochastic Models for Surface Information Extraction in Texts

We describe in this paper the application of numerical machine learning techniques to the extraction of information from a collection of textual data. More precisely, we consider the modeling of text sequences with Hidden Markov Models (HMMs) and Multi-layer Perceptrons (MLPs) and show how these models can be used to perform specific surface extraction tasks (i.e. tasks which do not need in dep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1602.00269  شماره 

صفحات  -

تاریخ انتشار 2016